Women have traditionally embraced multiple roles, including mother, daughter, wife, sister, and friend- often by choice or necessity. In today’s world, the role of women has evolved significantly and positively. Women are now educated, self-reliant and financially independent. They have various avenues for personal fulfillment and joy. This project aims to explore these avenues to uncover the words and topics linked to women’s happiness. The analysis comprises of 4 sections- analyzing women’s happiness overall and comparing it between women across different life stages (married/unmarried), in diverse global regions (developed/undeveloped), and among various age groups (20s/30s/40+)
# Importing the necessary libraries
library(tm)
library(tidytext)
library(tidyverse)
library(devtools)
library(DT)
library(scales)
library(countrycode)
library(dplyr)
library(ggplot2)
library(NLP)
library(tibble)
library(topicmodels)
library(wordcloud2)
library(gridExtra)
library(ngram)
install_github("gaospecial/wordcloud2")
# Reading "processed_moments.csv" created in data_preprocessing.R (in lib folder)
hm_data <- read_csv("~/Desktop/processed_moments.csv")
# Reading the demographics file
demographics<-'https://raw.githubusercontent.com/rit-public/HappyDB/master/happydb/data/demographic.csv'
demo_data <- read_csv(demographics)
Happiness takes on various meanings for different women. In this analysis, I aim to explore these unique definitions by examining the words they associate with their happiness. The analysis aims to explore the diverse meanings of happiness for women and uncover the words they connect with this cherished emotion.
# making a datatable that consists of only female data. It is formed by combining hm_data with demo_data
all_women <- hm_data %>%
inner_join(demo_data, by = "wid") %>%
select(wid,
original_hm,
gender,
marital,
parenthood,
reflection_period,
age,
country,
ground_truth_category,
text) %>%
mutate(count = sapply(hm_data$text, wordcount)) %>%
filter(gender %in% c("f"))
datatable(all_women)
# Creating a bag of words using the text data
bag_of_words_all_women <- all_women %>%
unnest_tokens(word, text)
word_count_all_women <- bag_of_words_all_women %>%
count(word, sort = TRUE)
# making a word cloud
wordcloud2(word_count_all_women, size = 0.6, rotateRatio = 0)
From the word cloud, it is evident that women derive happiness from various sources. The most prominent among these is the companionship of their friends, indicating the importance of social connections. Additionally, family members, including husbands and children, play a significant role in contributing to women’s happiness. The presence of words like “surprise,” “celebrated,” and “enjoyed” suggests that women also associate happiness with moments of celebration and enjoyment.
ggplot(top_bigrams_all_women, aes(x = reorder(paste(word1, word2, sep = " "), n), y = n)) +
geom_bar(stat = "identity", fill = "blue") +
labs(x = "Pair of Words", y = "Frequency") +
coord_flip() +
ggtitle("Top 10 Pairs of Words Women Associate Happiness With") +
theme_minimal()+theme(plot.title = element_text(hjust = 0.5))
From the bigram visualization, it appears that celebrating birthdays is a significant source of joy for some women. This is followed by pairs of words that indicate family and friends. Let’s figure out the top word that was used to describe happiness among women.This means examining unigram frequencies and identifying the single word most commonly associated with happiness in all_women dataset.
# Making a unigram of top 10 words
ggplot(top_words_all_women, aes(x = reorder(word, n), y = n)) +
geom_bar(stat = "identity", fill = "dark green") +
labs(x = "Word", y = "Frequency") +
coord_flip() +
ggtitle("Top 10 Words Tied to Women's Happiness") +
theme_minimal()+theme(plot.title = element_text(hjust = 0.5))
As anticipated, the unigram analysis reveals that “friend” and “family” are the primary elements that women associate with happiness.
Next, I perform Topic Modelling using Latent Dirichlet Allocation (LDA) to get Top 10 Topics Women Associate Happiness with.
# Top 10 topics (with 10 terms each) women associate happiness with
head(lda_terms_all_women, 10)
# plotting these topics into a bar chart
lda_topics <- topics(lda_model_all_women, k = 1)
all_women$topic <- as.factor(lda_topics)
ggplot(data = all_women) +
geom_bar(stat = "count", aes(x = topic, fill = topic)) +
scale_fill_discrete(name = "Topics",
labels = c("1. Family", "2. Food", "3. Children", "4. Job",
"5. Education", "6. Reading",
"7. Shopping", "8. Nature", "9. Celebration", "10. Entertainment")) +
ylab("Number of Happy Moments") + xlab("Topics")+ggtitle("Top 10 Topics Tied to Women's Happiness") +
theme_minimal()+theme(plot.title = element_text(hjust = 0.5))
Overall, women find happiness in family and friends. They also express interest in additional aspects of life, including job, shopping and nature, indicating their ongoing exploration of diverse sources of joy. But do these topics change when we subset women into various categories? The following sections will explore this. We start off by comparing the sources of happiness between unmarried and married women.
Note that single, widowed and divorced women are considered unmarried for the purposes of the analysis.
wordcloud2(word_count_unmarried_women, size = 0.6, rotateRatio = 0)
Friends continue to be a big source of joy. This is followed by words that resonate with themes of relationship, celebration and shopping.
wordcloud2(word_count_married_women, size = 0.6, rotateRatio = 0)
Words such as “husband”,“daughter” and “son” suggest that married women place high importance on personal relationships.
The bigram visualization highlights key themes of family, friends, and celebrations as central to women’s happiness. Additionally, unmarried women appear to derive personal fulfillment from activities like watching TV and job interviews.
For married women, happiness primarily stems to be from family relationships.
The unigrams reflect the same themes depicted on the bigrams.
Next, I perform Topic Modelling using Latent Dirichlet Allocation (LDA) to get Top 10 Topics Married and Unmarried Women Associate Happiness with.
Overall, unmarried women tend to derive happiness from interesting topics like nature, education and job. On the other hand, children are a constant source of joy for married women. It also seems like married women have an additional appreciation for food. Despite these differences, both groups share the belief that family is the most significant source of joy.
The next plot is a scatterplot that compares happy moments of unmarried and married women
Note the definitions of ‘developed’ and ‘undeveloped’ regions are as per the United Nations website.
https://population.un.org/wpp/DefinitionOfRegions/
For this analysis, I’ll be using countrycode library to identify the continents each of female respondent belongs to. Following this, based on their continent’s economic status as per United Nations, female respondents are going to be split into developed and undeveloped categories.
## [1] "USA" "DNK" "IND" "KWT" "FIN" "VEN" "CAN" "IRL" "GBR" "JAM" "ESP" NA
## [13] "MEX" "ARM" "NGA" "PHL" "GRC" "LTU" "BGR" "TUR" "DZA" "IDN" "ZAF" "AUT"
## [25] "LKA" "PAK" "NZL" "SRB" "ETH" "PRI" "NIC" "NLD" "EGY" "AUS" "BEL" "DEU"
## [37] "ITA" "ASM" "THA" "UGA" "ARE" "JPN" "DOM" "UMI" "CYP" "PRT" "MYS" "FRA"
## [49] "BRB" "CZE" "BHS" "ISL" "SUR" "MKD" "TCA" "TTO" "SGP" "BRA" "ZMB" "AFG"
## [61] "TWN" "VIR" "SLV" "GTM" "NOR" "COL" "MDA"
## [1] "Northern America" "Northern Europe"
## [3] "Southern Asia" "Western Asia"
## [5] "South America" "Caribbean"
## [7] "Southern Europe" NA
## [9] "Central America" "Western Africa"
## [11] "South-Eastern Asia" "Eastern Europe"
## [13] "Northern Africa" "Southern Africa"
## [15] "Western Europe" "Australia and New Zealand"
## [17] "Eastern Africa" "Polynesia"
## [19] "Eastern Asia"
wordcloud2(word_count_all_women_developed, size = 0.6, rotateRatio = 0)
wordcloud2(word_count_all_women_undeveloped, size = 0.6, rotateRatio = 0)
Comparing the above 2 word clouds, it seems like women in developed regions find happiness in family, which has been a common theme. However, a notable difference is the prominence of the word “home” in the word cloud of women in developed regions. This may be attributed to the fact that women in developed regions often have the privilege of staying at home, whereas those in undeveloped areas may need to work diligently to support their families.
Both the visualizations highlight the observation made above- women in developed regions can afford to stay at home and find personally fulfilling activities. For women in undeveloped regions, this is not an option. Hence, their happiness is sourced from offsite activities at work.
Let’s proceed with Topic Modelling using Latent Dirichlet Allocation (LDA) to get Top 10 Topics Women in these regions Associate Happiness with.
Overall, women in developed countries often find happiness in home-based activities and have the means to enjoy vacations with their families. In contrast, women in undeveloped countries, while also valuing shopping, may not have the same opportunities for extravagant vacations.
The next plot is a scatterplot that compares happy moments of women in these two regions
Now, we will split the data into 3 groups: twenties (women of age 20+ but less than 30), thirties (women of age 30+ but less than 40) and forties_and_over (women of age 40+ but less than 100). Please note we are not considering teenagers in this analysis since most of the topics like job, children won’t apply to them. Additionally, there are some outliers in age category: values like 233. Hence, the upper limit for age is 100. The overall range is (20-100).
## [1] 30 28 55 41 25 32 38 47 79 29 64 34 26 61 27 33 31 23 44
## [20] 22 21 35 45 46 62 65 36 39 52 53 42 57 43 20 24 48 37 51
## [39] 49 66 58 56 60 50 54 18 40 19 71 98 NA 70 227 68 59 69 67
## [58] 2 63 88 77 74 81 233 75 95 4 72 80 84 17 3 73
wordcloud2(word_count_twenties, size = 0.6, rotateRatio = 0)
wordcloud2(word_count_thirties, size = 0.6, rotateRatio = 0)
Through the progression of the word clouds, it becomes apparent that the word “friend” appears less prominently as women transition from their 20s to older age groups, with increasing emphasis on family. Additionally, celebrations seem to decrease in significance as middle-aged women prefer spending time at home with their children.
In the bigram visualizations, it’s evident that as women age, their personal celebrations, such as birthdays and college journeys, are gradually replaced by those of their children. This shift suggests that women ultimately find more joy in celebrating their family’s milestones as they age.
Next, I perform Topic Modelling using Latent Dirichlet Allocation (LDA) to get top 10 topics women of different age groups associate happiness with.
Here, we observe interesting series of trends that reveal previously unexplored topics. Women in their 20s appear to find happiness in food, while those in their 30s lean towards shopping, and those aged 40 and above seem to derive joy from nature.